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Abstract. We study nondeterministic strategies in parity games with the 
aim of computing a most permissive winning strategy. Following earlier 
work, we measure permissiveness in terms of the average number/weight 
of transitions blocked by a strategy. Using a translation into mean-payoff 
parity games, we prove that deciding (the permissiveness of) a most 
permissive winning strategy is in NP n coNP. Along the way, we provide 
a new study of mean-payoff parity games. In particular, we give a new 
algorithm for solving these games, which beats all previously known 
algorithms for this problem. 

1 Introduction 

Games extend the usual semantics of finite automata from one to several players, 
thus allowing to model interactions between agents acting on the progression of 
the automaton. This has proved very useful in computer science, especially for the 
formal verification of open systems interacting with their environment |21j . In this 
setting, the aim is to synthesise a controller under which the system behaves 
according to a given specification, whatever the environment does. Usually, this 
is modelled as a game between two players: Player 1 represents the controller and 
Player 2 represents the environment. The goal is then to find a winning strategy 
for Player 1, i.e. a recipe stating how the system should react to any possible 
action of the environment, in order to meet its specification. 

In this paper, we consider multi- strategies (or non- deterministic strategies, 
cf. as a generalisation of strategies: while strategies select only one possible 

action to be played in response to the behaviour of the environment, multi- 
strategies can retain several possible actions. Allowing several moves provides 
a way to cope with errors (e.g., actions being disabled for a short period, or timing 
imprecisions in timed games). Another quality of multi-strategies is their ability 
to be combined with other multi-strategies, yielding a refined multi-strategy, 
which is ideally winning for all of the original specifications. This offers a modular 
approach for solving games. 
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Classically, a strategy is more permissive than another one if it allows more 
behaviours. Under this notion, there does not need to exist a most permissive win- 
ning strategy [Ij. Hence, we follow a different approach, which is of a quantitative 
nature: we provide a measure that specifies how permissive a given multi-strategy 
is. In order to do so, we consider weighted games, where each edge is equipped 
with a weight, which we treat as a penalty that is incurred when disallowing 
this edge. The penalty of a multi-strategy is then defined to be the average sum 
of penalties incurred in each step (in the limit). The lower this penalty is, the 
more permissive is the given multi-strategy. Our aim is to find one of the most 
permissive multi-strategies achieving a given objective. 

We deal with multi-strategies by transforming a game with penalties into a 
m,ean-payoff game 124] with classical (deterministic) strategies. A move in 
the latter game corresponds to a set of moves in the former, and is assigned a 
(negative) reward depending on the penalty of the original move. The penalty of a 
multi-strategy in the original game equals the opposite of the payoff achieved by 
the corresponding strategy in the mean-payoff game. In previous work, Bouyer et 
al. [3] introduced the notion of penalties and showed how to compute permissive 
strategies wrt. reachability objectives. We extend the study of [3] to parity 
objectives. This is a significant extension because parity objectives can express 
infinitary specifications. Using the above transformation, we reduce the problem 
of finding a most permissive strategy in a parity game with penalties to that of 
computing an optimal strategy in a mean-payoff parity game, which combines a 
mean-payoff objective with a parity objective. 

While mean- payoff parity games have already been studied we propose 

a new proof that these games are determined and that both players have optimal 
strategies. Moreover, we prove that the second player does not only have an 
optimal strategy with finite memory, but one that uses no memory at all. Finally, 
we provide a new algorithm for computing the values of a mean-payoff parity 
game, which is faster than the best known algorithms for this problem; the 
running time is exponential in the number of priorities and polynomial in the 
size of the game graph and the largest absolute weight. 

In the second part of this paper, we present our results on parity games with 
penalties. In particular, we prove the existence of most permissive multi-strategies, 
and we show that the existence of a multi-strategy whose penalty is less than a 
given threshold can be decided in NP n coNP. Finally, we adapt our deterministic 
algorithm for mean-payoff parity games to parity games with penalties. Our 
algorithm computes the penalties of a most permissive multi-strategy in time 
exponential in the number of priorities and polynomial in the size of the game 
graph and the largest penalty. 

Related work. Penalties as we use them were defined in [3] . Other notions of 
permissiveness have been defined in [Illin]i but these notions have the drawback 
that a most permissive strategy might not exist. Multi- strategies have also been 
used for different purposes in [T7]. 

The parity condition goes back to [121 [H] and is fundamental for verification. 
Parity games admit optimal memoryless strategies for both players, and the 
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problem of deciding the winner is in NP n coNP. As of this writing, it is not 
known whether parity games can be solved in polynomial time; the best known 
algorithms run in time polynomial in the size of the game graph but exponential 
in the number of priorities. 

Another fundamental class of games are games with quantitative objectives. 
Mean-payoff games, where the aim is to maximise the average weight of the 
transitions taken in a play, are also in NP n coNP and admit memoryless optimal 
strategies I24j . The same is true for energy games, where the aim is to always 
keep the sum of the weights above a given threshold [5, 4 . In fact, parity games 
can easily be reduced to mean-payoff or energy games ■ 



Finally, several game models mixing several qualitative or quantitative ob- 
jectives have recently appeared in the literature: apart from mean-payoff parity 
games, these include generalised parity games [9j, energy parity games [6^ and 
lexicographic mean-payoff (parity) games |2] as well as generalised energy and 
mean-payoff games [7 . 

2 Preliminaries 

A weighted game graph is a tuple G = (Qi, Q2, E, weight), where Q := Q1UQ2 is a 
finite set of states, E C Q x Q is the edge or transition relation, and weight : — >■ K 
is a function assigning a weight to every transition. When weighted game graphs 
are subject to algorithmic processing, we assume that these weights are integers; 
in this case, we set W := max{l, |weight(e)| | e S E}. 

Moreover, we define the size of G, denoted by ||G||, as \Q\ + \E\ ■ \\0g2 W^. 
(Up to a linear factor, ||G|| is the length of a binary encoding of G). In the same 
spirit, the size ||a;|j of a rational number x equals the total length of the binary 
representations of its numerator and its denominator. 

For q £ Q, we write qE for the set {q' & Q \ {q,q') & E} of all successors 
of q. We require that qE ^ ^ for all states q & Q. A subset S* C Q is a subarena 
of G if qE n S ^ ^ for all states g G S*. If S" C Q is a subarena of G, then we 
can restrict G to states in S, in which case we obtain the weighted game graph 
G \ S := {Qir]S,Q2nS,En{S X 5), weight \ S x S). 

A play of G is an infinite sequence p = p{0)p{l) • • • G of states such that 
(p(i), p{i+ 1)) G E for all i € N. We denote by Out^(g) the set of all plays p with 
p(0) = q and by Inf (p) the set of states occurring infinitely often in p. 

A play prefix or a history 7 = 7(0)7(1) • • • 7(n) G is a finite, nonempty 
prefix of a play. For a play or a history p and j < fc e N, we denote by 
Pbi ^) ■= p[ji k — I] '■— p(i) ■ ■ ■ p{k — 1) its infix that starts at position j and ends 
at position k — 1; the play's suffix p{j)p{j -I- 1) • • • is denoted by p[j, 00). 

Strategies. A (deterministic) strategy for Player i in G is a function a : Q*Qi Q 
such that cr^jq) € qE for all 7 € Q* and q €z Qi- A strategy a is memoryless 
if (7(7(7) = cr{q) for all 7 S Q* and q G Qi- More generally, a strategy a is 
finite-memory if the equivalence relation ~ C Q* x Q*, defined by 71 72 if and 
only if (7(71 • 7) = (7(72 • 7) for all 7 G Q*Qi, has finite index. 
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We say that a play p of G is consistent with a strategy a for Player i if 
p{k + 1) = cr(p[0, fc]) for all A: e N with p{k) e Qj, and denote by Out '^(cr, qq) the 
set of all plays p of G that are consistent with a and start in p(0) = Qq. Given 
a strategy a of Player 1, a strategy r of Player 2, and a state qo & Q, there exists 
a unique play p € Out*^((T, qo) fl Out*^(r, qo), which we denote by p^((T, r, g'o)- 

Traps and attractors. Intuitively, a subarena T C Q of states is a trap for one 
of the two players if the other player can enforce that the play stays in this set. 
Formally, a trap for Player 2 (or simply a 2-trap) is a subarena T C Q such that 
qECT for all states g G T n O2, and n T ^ for all g G T n Qi. A trap for 
Player 1 (or 1-trap) is defined analogously. Note that if T is n trap for Player i 
in G f 5 and 5 is a trap for Player 1 in G, then T is also a trap for Player i in G. 

If T C Q is not a trap for Player 1, then Player 1 has a strategy to reach 
a position in Q \ T. In general, given a subset 5* C Q, we denote by Attr^(S') 
the set of states from where Player 1 can force a visit to S. This set can be 
characterised as the limit of the sequence (Ai)^^^ defined hy A° = S and 

J^^+l =A'U{qeQi\qEnA'^ID}U{qGQ2\qEC A'} . 

From every state in Attrf^(5), Player 1 has a memoryless strategy a that guar- 
antees a visit to S in at most \Q\ steps: the strategy chooses for each state 
q G {A' \ A'^~^) n Qi a state p G qEn A^~^ (which decreases the distance to 5 
by 1). We call the set Attrf (5) = IJieN 1-attractor of S and a an attractor 

strategy for S. The 2-attractor of a set S, denoted by Attr^(S'), and attractor 
strategies for Player 2 are defined symmetrically. Notice that for any set S, the set 
Q \ Attrf (5) is a 1-trap, and if S' is a subarena (2-trap), then Attr^(S') is also a 
subarena (2-trap). Analogously, Q \ Attr^(S') is a 2-trap, and if S" is a subarena 
(1-trap), then Attr^(5) is also a subarena (1-trap). 

Convention. We often drop the superscript G from the expressions defined above, 
if no confusion arises, e.g. by writing Out((j, qo) instead of Out'" {a, qo)- 

3 Mean-payofF parity games 

In this first part of the paper, we show that mean-payoff parity games are 
determined, that both players have optimal strategies, that for Player 2 even 
memoryless strategies suffice, and that the value problem for mean-payoff parity 
games is in NP n coNP. Furthermore, we present a deterministic algorithm which 
computes the values in time exponential in the number of priorities, and runs in 
pseudo-polynomial time when the number of priorities is bounded. 

3.1 Definitions 

Formally, a mean-payoff parity game is a tuple G = (G, x), where G is a weighted 
game graph, and x- <3 ^ N is a priority function assigning a priority to every 



4 



state. A play p = p{Q)p{l) • • • is parity-winning if the minimal priority occurring 
infinitely often in p is even, i.e., if min{x(g) | q e Inf(p)} = (mod 2). All 
notions that we have defined for weighted game graphs carry over to mean- 
payoff parity games. In particular, a play of Q is just a play of G and a strategy 
for Player i vtv Q \s nothing but a strategy for Player i in G. Hence, we write 
Out^ (cr,g) for Out'^((T, g), and so on. As for weighted games graphs, we often 
omit the superscript if Q is clear from the context. Finally, for a mean-payoff 
parity game Q — (G, x) and a subarena S of G, we write Q\S iov the mean-payoff 
parity game {G \ S,x\ S). 

We say that a mean-payoff parity game Q — (G, %) is a mean-payoff game 
if x{q) is even for all g e Q. In particular, given a weighted game graph G, 
we obtain a mean-payoff game by assigning priority to all states. We denote 
this game by (G,0). 

If x{Q) ^ {Oi 1}) then we say that ^ is a m,ea,n-payoff Biichi game; if xiQ) ^ 
{1,2}, we call it a mean-payoff co-Buchi game. Hence, in a Biichi game Player 1 
needs to visit the set x~^(0) infinitely often, whereas in a co-Biichi game he has 
to visit the set X "'^(1) only finitely often. 

For a play p of 5, we define its payoff as 



payoff^ (p) = 
where for n e N 



lim inf payoff^ (p) if p is parity- winning, 

n— ^oo 

— oo otherwise, 



payo: 



n— 1 

- weight (p( j), p(i + 1)) if n > 0, 
-OO if n = 0. 



If cr is a strategy for Player 1 in Q, we define its value from go € Q as 

val^ (cr, go) = infTPayoff^(p(cr, T, go)) = inf { payoff ^(p) | p e Out^ (cr, go)}, 

where r ranges over all strategies of Player 2 in Q. Analogously, the value of a 
strategy r for Player 2 from go is defined as 

val^(T,go) = sup^ payoff^ (p(c7,r, go)) = sup{payoff ^ (p) | p e Out^(r,go)}, 

where a ranges over all strategies of Player \ m Q. The lower and upper value of 
a state go G Q are defined by 

val ^ (go) = sup^ val^ (cr, go) and val^ (go) = inf val^ (r, go) , 

respectively. Intuitively, yal^(go) and val^(go) are the maximal (respectively 
minimal) payoff that Player 1 (respectively Player 2) can ensure (in the limit). 
We say that a strategy a of Player 1 is optimal from go if val^(o-, go) = val^(go). 
Analogously, we call a strategy r of Player 2 optimal from go if val^ (r, go) = 



5 





Fig. 1. A mean-payofF parity game for which infinite memory is necessary 

val^(go)- A strategy is (globally) optimal if it is optimal from every state q £ Q. 
It is easy to see that val^(an) < val^(go)- If val^(qn) = val (go), we say that 
qo has a value, which we denote by val^(go)- 

In the next section, we will see that mean-payofF games are determined, i.e., 
that every state has a value. The value problem is the following decision problem: 
Given a mean-payoff parity game G (with integral weights) , a designated state 
qo (z Q, and a number a; € Q, decide whether val^((7o) ^ ^■ 

Example 1 . Consider the mean-payoff parity game Q depicted in Fig. [T] where a 
state or an edge is labelled with its priority, respectively weight; all states belong 
to Player 1. Note that val^((7i) = 1 since Player 1 can delay visiting q2 longer 
and longer while still ensuring that this vertex is seen infinitely often. However, 
there is no finite- memory strategy that achieves this value. 

Let (7 be a finite-memory strategy of Player IvnQ, and let p be the unique play 
of Q that starts in qi and is consistent with a. Assume furthermore that p visits q2 
infinitely often (otherwise val^((T, qi) = — oo). Then p = gi*^^ 9291 '^^92 • • • , where 
each ki £ N \ {0}. Since cr is a finite-memory strategy, there exists m € N such 
that ki < m for all i G N. Hence, val^((T, qg) = payoff (p) < m/{m -|- 1) < 1. 



3.2 Strategy complexity 

It follows from Martin's determinacy theorem [18] that mean-payoff parity games 
are determined. Moreover, Chatterjee et al. [5] gave an algorithmic proof for 
the existence of optimal strategies. Finally, it can be shown that for every 
X £ MU{— 00} the set {p £ \ payoff (p) > x\ is closed under combinations. By 
Theorem 4 in [TH], this property implies that Player 2 even has a memoryless 
optimal strategy. We give here a purely inductive proof of these facts that does 
not rely on Martin's theorem. We start by proving that Player 1 has an optimal 
strategy in games where Player 2 is absent. 

Lemma 2. Let Q be a mean-payoff parity game with Q2 = 0. Then Player 1 has 
an optimal strategy in Q . 

Proof. It suffices to construct for each q^ £ Q a. strategy a with val^(tT, go) > 
val^(go)- If val^(go) = —00, we can choose an arbitrary strategy cr. Otherwise, by 
the definition of val^(go), for each e > there exists a play pe £ Out^((7o) with 
payoff (pe) > val^(go) — £• Consider the sets Inf(pe) of states occurring infinitely 
often in p^. Since there are only finitely many such sets, we can find a set P C Q 
such that for each e > there exists < e' < e with P = Inf(pe'). Let gmin G P 
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be a vertex of lowest priority. (Tliis priority must be even since each, fulfils 
the parity condition). 

Let CTi be an optimal memoryless strategy in the mean-payoff game Qp = 
(G \ P, 0) (the strategy cti just leads the play to a simple cycle with maximum 
average weight), and let <72 be the memoryless attractor strategy in the game Qp 
that ensures a visit to qmin from all states q £ P; we extend both strategies to 
a strategy in Q by combining them with a memoryless attractor strategy for P. 
(In particular, (T2 enforces a visit to q^iin from qQ.) Note that val^'^lq) > val^(go) 
for all q G P since each of the plays p^' visits each vertex in P and has payoff 
> val^(go)-e'. 

Player I's optimal strategy a is played in rounds: in the ith round. Player 1 
first forces a visit to qmin by playing according to <T2', once qmin has been visited. 
Player 1 plays ai for i steps before proceeding to the next round. Note that 
val^^ (ct, gniin) = val^''(f7i, gniin). Moreover, the unique play p e Out^((T, go) 
satisfies gmin S Inf(p) C P and therefore fulfils the parity condition. To sum 
up, we have val^(a-, go) = val^ (cr, gmin) = val^^ (ct, gmin) = val^-^(cri, g„iin) = 
val^^(g,ni„) > val^(go). □ 

Using Lemma [2] we can prove that mean-payoff-parity games are not only 
determined, but also that Player 1 has an optimal strategy and that Player 2 
has a memoryless optimal strategy. 

We use the loop factorisation technique (cf. [23^ ) : Let 7 be a play prefix and 
let q € Q. The loop factorisation of 7 relative to q is the unique factorisation 
of the form 7 = 7071 • • - 7;, where 70 does not contain g, and each factor 7^, 
1 < i < I, is of the form = q ■ j[ where 7,' does not contain g. Analogously, for 
a play p which has infinitely many occurrences of g the loop factorisation of p 
relative to g is the unique factorisation p = 7071 • • • where each 7; has the same 
properties as in the above case. 

For a state g with m successors, qE — {gi, . . . , g™}, we define an operator 
TTi : Q* — >■ Q* for each 1 < i < m by setting 



7 if either 7 = ggi7' for some 7' G Q* or 7 = g^ = g, 
e otherwise. 



The operator tt.^ induces another operator Ui : Q* Q* by setting 

i7,(7)=i7.,(7o)iI^(7i)---^.(7/), 

where 7 = 7o7i • ■ ■ 7/ is the loop factorisation of 7 relative to g. The opera- 
tor Ili operates on play prefixes, but it can easily be extended to operate on 
infinite plays with infinitely many occurrences of g. 

Theorem 3. Let Q be a mean-payoff parity game. 

1. Q is determined; 

2. Player 1 has an optimal strategy in Q; 

3. Player 2 has a memoryless optimal strategy in Q. 
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Proof. We proceed by an induction over the size of S '-^ {q E Q2 \ \qE\ > 1}, the 
set of all Player 2 states with more than one successor. If 5 = 0, all statements 
follow from Lemma [2] Let 1.-3. be fulfilled for all games with \S\ < n and 
let Q = (G, x) be a mean-payoff parity game with \S\ — n. We prove that the 
statements also hold for Q. Let q E S with qE = {qi , . . . , qm}- For each I < j < m, 
we define a new game Qj = {Gj,x) by setting Ej — E \ {{q} x Q) U {{q,qj)}, 
and Gj — {Qi,Q2, Ej, weight \ Ej). Note that the induction hypothesis applies 
to each Qj. W.l.o.g. assume that val^^(q) < val^^(q) for all I < j < m. We will 
construct a memoryless strategy t for Player 2 and a strategy a for Player 1 such 
that val^(r, go) < val^^((7o) and val^((T, go) > val^^(qo) for every go G Q- Hence, 

val^^(go) < val^ (cr, go) < val^(go) < val^(go) < val^(T, go) < val^i(go), 

and all these numbers are equal. In particular, we have val^(go) = val^(go) = 

val (go), val^((T, go) — val^(go) and val^(T, go) — val^(go), which proves 1.-3. 

By the induction hypothesis. Player 2 has a memoryless optimal strategy r 
in Oi. Clearly, r is also a memoryless strategy for Player 2 in G, and val^(T, go) = 
val^^(T, go) = val^'(go) for all go E Q. 

It remains to construct a strategy a for Player I in Q such that val^(cr, go) > 
val^^(go) for all go E Q. 

First, we devise a strategy a such that val^{a, g) > val^^ (g). If val^^ (g) = —00, 
we can take an arbitrary strategy. Hence, assume that val (g) is finite. By the 
induction hypothesis, for each j — 1, . . . , m there exists a strategy Cj for Player 1 
in Qj with val^^ (aj , g) = val^^ (g) . We define a to be the interleaving strategy, 
defined by 

ggi7' for some ^' E Q* , 

qqml' for some 7' G Q*, 

for all play prefixes 7 whose loop factorisation relative to g equals 70 ■ ■ • 7; ■ 
We claim that val^((T, g) > val^^ (g). 

Let p E Out^((T, g). If p has only finitely many occurrences of g, then p is 
equivalent to a play in Qj that is consistent with aj for some j. Since val'~'^ (g) > 
val'~^^(g) and aj is optimal, payoff (p) > val'^^(g), and we are done. Otherwise, 
consider the loop factorisation p = 7o7i • • • and set 

r = {j E {1, . . . , m} I 7j • g is a loop in Qj for infinitely many i E N}. 

Since the mean-payoff parity condition is prefix-independent, we can assume 
w.l.o.g. that every loop in p is a loop in Qj for j E F. For each j E F, denote 
by pj = F[j{p) the corresponding play in Qj. By definition of (T, we have pj E 
Out^^((Tj,g) for each j E F. Since val^^(g) is finite and val^^(g) < val^^(g), 
each pj fulfils the parity condition. As the minimal priority occurring infinitely 
often in p also occurs infinitely often in one pj, this implies that p fulfils the 
parity condition. 



o-(7) = 5-(7o • • - 7;) 



f^i(^i(7)) if li = 



,(7T„(7)) ii^i 
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We claim that for each n > 0, payoflF„(/9) is a weighted average of payoff„ . (pj) 
for some rij > 0. To sc;c; this, consider the loop factorisation 7o • • • 7^ of pfo, n]. 
(Note that 7- = 7j for all i < k.) For each j e F, set 

{|i7j(p[0,n])| — 1 if 7^ is a history of Qj and either 7]^. ^ § or = q. 
|i7j(p[0,n])| otherwise. 

Intuitively, Uj is the number of transitions in jo[0, n\ that correspond to a transition 
in pj. Hence, 

{{p{i),p{i+l))\Q<i<n}= \J{{p,{i),p,{i + l))\Q<i<n^}. 

jer 

In particular, J2jer '^i — ^'^'^ Sjer ^j/^ = 1- We have 

^ n— 1 

payofr„(p) = - Vweight(p(i),p(i + 1)) 
n — ' 



jer 1=0 

ni>0 



= :f • weight(pj(i),pj(i + l)) 



jer ■' i=o 

nj>0 



= Yl 5" - payoff n, (Pj)- 



Since a weighted average is always bounded from below by the minimum element, 
we can conclude that 

payoff„(p) > minpayofr„ (pj) > minpayofr„ (p^). 

nj>0 

Taking the lower limit on both sides, we obtain 

payoff (p) = liminf payoff „(p) 

> lim inf min payoff „ . (p,) 
= min lim inf payoff „ (p, ) 



min lim inf payoff „ (pi) 



min payoff (pj ). 
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Since each pj is consistent with aj and Uj is optimal, we have payofF(/3j) > 
val^^ (g) > val^^(g) for each j ^ F and therefore also payofF(p) > Yal^^{q). Since 
this holds for all p € Out^((T, q), we can conclude that val^((T, q) > val^^ (q). 

Finally, we construct a strategy a for Player 1 in Q such that val^ ((7,(70) ^ 
val^^((7o) for all qo ^ Q. Let 

I '''1(7) if 9 does not occur in 7, 
(7(7) = < 

[&{qj2) if 7 = 71972 with 71 e (Q \ {<?})*• 

Then for each play p e Out^((T, (;o) where q does not occur, it holds payoff^ (p) = 
payoff^^(p) > val^^ ((Ji, (70) — yal^^{qo). If g occurs in at least one play consis- 
tent with a, then in the game Qi (where ai is optimal), we have val^'^((7o) = 
val^^((7i, go) < val^^(g). Hence, for each play p G Out^((7,(7o) where q oc- 
curs (say at position j), it holds payoff^ (p) = payoff^ (p[j, 00)) > val^(a,q) > 
val^^(g) > val^^((7o)- Altogether we have payoff^ (p) > val^^((/o) for every play 
p G Out^(cr, go) and therefore val^ (cr, go) > val^^(go). □ 

A consequence of the proof of Lemma [5] and Theorem [3] is that each value of 
a mean-payoff parity game is either —00 or equals one of the values of a mean- 
payoff game played on the same weighted graph (or a subarena of it). Since 
optimal memory less strategies exist in mean- payoff games [TT], the values of 
a mean-payoff game with integral weights are rational numbers of the form r/s 
with \r\ < \Q\-W and \s\ < \Q\. Consequently, this property holds for the (finite) 
values of a mean-payoff parity game as well. 

While Example [T] demonstrates that an optimal strategy of Player 1 requires 
infinite memory in general, this is not the case for mean-payoff co-Biichi games, 
where both players have memoryless optimal strategies. This can be seen by 
applying Theorem 2 of |13j or by an inductive proof, which we provide here. 

Theorem 4. Let Q be a mean-payoff co-Biichi game. Then Player 1 has a 
memoryless optimal strategy from every state go G Q. 

Proof. The proof is by induction over the number jQI = n of states in Q. For 
n = 1, the statement is trivially fulfilled. Now let n > 1, go G Q, and assume 
that the statement is true for all games with less than n states. Define Q' = 
Q \ Attr2(x^^(l))- If Q' = 0, then Player 2 can force visiting x^^(l) infinitely 
often by playing a memoryless attractor strategy. Hence, val^(go) — —00, and 
every memoryless strategy of Player 1 is optimal. In the following, assume that 
Q' ^ 0. Consider the game G' ■= G \ Q', which is a mean-payoff game, and set 

S:= {geQ'|val^'(g)>val^(go)}. 

Note that 5 is a trap for Player 2 both in G' and in G (since Q' is a 2-trap 
in G)- We claim that 5 7^ 0. Towards a contradiction, assume that S — 9, i.e., 
val^ (g) < val^(go) for all q G Q' , and let r be an optimal memoryless strategy 
for Player 2 in G' ■ We extend t to a strategy in G by combining it with a 
memoryless attractor strategy for x~^(l) on Attr2(x"^(l))- Let p € Out^(T, go) 
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and m := max^gg' val^ (q). Either p visits Attr2(x ^(1)) and therefore also 
X~^(l) infinitely often, in which case payoff(p) — — oo < m, or p[z,oo) is a play 
of G' for some i e N, in which case payoff (p) = payoff(p[?, oo)) < val^ (p(*)) < ™- 
Hence, val^(go) < val^(T, go) <m< val^(go)j a contradiction. 

Now, let cr' be a memoryless optimal strategy of Player 1 in Q' . By the 
definition of S, we have val^ (c', ?) ^ val^((7o) for all q £ S. Moreover, cr' induces 
a memoryless strategy as in fS" such that val^^^ {as, q) = val^ (cr', g) > val^((7o) 
for all q £ S. Let A = Attr^(S'). We extend as to a memoryless strategy cr^ 
m. Q \ Ahy combining it with a memoryless attractor strategy for S on A \ S . 
It follows that val^''^(CTA,g) > val^(go) for all q e Attri(5'). If go G Attri(S'), 
we are done. Otherwise, £ T := Q \ A. Since S* ^ 0, the game ^ \ T has 
less states than and by the induction hypothesis. Player 1 has a memoryless 
optimal strategy ax from q^ vtv Q \ T. Note that, since T is a trap for Player 1, 
we have val^^"^(crx, go) — val^^^(go) > val^(go)- Let a be the union of cta 
and ar, which is a memoryless strategy in Q. We claim that a is optimal 
from go in Q- Let p S Out^(a, go). If p stays in T, it is consistent with ax 
and must have payoff at least val^'^"^(crT, go) > val^(go). Otherwise, there exists 
z € N such that p(i) € A and p[i,oo) is consistent with a a, which implies 
payoff (p) = payoff (p[i,cx))) > val^''^(cryi, p(i)) > val^(go). □ 

3.3 Computational complexity 

In this section, we prove that the value problem for mean-payoff parity games 
lies in NP fl coNP. Although this has already been proved by Chatterjee and 
Doyen [B] , our proof has the advantage that it works immediately on mean- payoff 
parity games, and not on energy parity games as in 

In order to put the value problem for mean-payoff parity games into coNP, 
we first show tliat the value can be decided in polynomial time in games where 
Player 2 is absent. 

Proposition 5. The problem of deciding, given a mean-payoff parity game Q 
with Q2 ~ ^, OL state qo G Q, and x £ Q, whether val^(go) > x, is in P. 

Proof. Deciding whether val^(go) > x is achieved by Algorithm [l] which employs 
as subroutines Tarjan's linear-time algorithm [TD] for SCC decomposition and 
Karp's polynomial-time algorithm jl5j for computing the minimum/maximum 
cycle weight, (i.e. the minimum/maximum average weight on a cycle) in a given 
strongly connected graph. 

The algorithm is sound: If the algorithm accepts, then there is an even 
priority p and a reachable SCC C in Gp with p e x(C') that has maximum 
cycle weight w > x. We construct a strategy a for Player 1 with val^ (cr, go) = w. 
Let g g C be a state with priority p. Since g is reachable from go and C is 
strongly connected, both go and C he inside Attri({g}). Let aq be the memoryless 
attractor strategy for {g}. Now, since w is the maximum cycle weight in C, there 
exists a simple cycle 7 = gi • • • g„gi in C with cycle weight w. We construct a 
(memoryless) strategy a^ on C by setting a^{qn) = gi and aj{qi) — g^+i for 
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Algorithm 1. A polynomial-time algorithm for deciding the value of a state 
in a one-player mean-payoff parity game. 

Input: mean-payoff parity game G with (52 = 0, go G Q, a; G Q. 
Output: whether val^(qo) > ^■ 

G' = G \ {q (z Q \ q is reachable from go} 
for each even p G xiQ) do 

Gp = G' \{qeQ\ xiq) > P} 

decompose Gp into SCCs 

for each SCC C of Gp with p G x(C) do 
compute maximum cycle weight to in C 
if w > X then accept 

done 
done 
reject 



every 1 < i < n; this strategy is extended to the whole game by combining it 
with an attractor strategy for {qi, . . . , g„}. The strategies aq and cr^ are then 
combined to a strategy a, which is played in rounds: in the ith round, Player 1 
first forces a visit to x~'^{p) H C by playing according to ct^; once x~^{p) ^ C has 
been reached, Player 1 plays cr-y for i steps before proceeding to the next round. 
Note that a fulfils the parity condition because q is visited infinitely often and 
all other priorities that appear infinitely often obey x{q) ^ P- Finally, the payoff 
of p(ct, qo) equals the cycle weight of 7, i.e., val^(go) > val^(cr, qo) = w > x. 

The algorithm is complete: Assume that val^(go) ^ v > x and let p e Out^(go) 
be a play with payoff^ (p) = w; such a play exists due to Lemma [2] Consider the 
set Inf(p) and let p = minx(Inf(p)) (which is even since payoff (p) is finite). Since 
Inf(p) is strongly connected, Inf(p) C C for an SCC C of Gp with p e x(C). 
Since optimal memoryless strategies exist in mean-payoff games, there exists a 
simple cycle with average weight > w in C. Hence the algorithm accepts. 

Since SCC decomposition and maximum cycle weight computation both take 
polynomial time, the whole algorithm runs in polynomial time. □ 

It follows from Theorem |3] and Proposition |5] that the value problem for mean- 
payoff parity games is in coNP: to decide whether val^ (gg) < x, a nondeterministic 
algorithm can guess a memoryless strategy t for Player 2 and check whether 
val^(T, go) < 2^ in polynomial time. 

Corollary 6. The value problem for mean-payoff parity games is in coNP. 

Following ideas from [B], we prove that the value problem is not only in coNP, 
but also in NP . The core of Algorithm [2] is the procedure Check that on input S 
checks whether the value of all states in the game ^ f S" is at least x. If the 
least priority p in is even, this is witnessed by a strategy in the mean-payoff 
game (G \ 5,0) that ensures payoff > x and the fact that the values of all 
states in the game G \ S \ Attrj''^'^(x~^(p)) are greater than x, which we can 
check by calling Check recursively. If, on the other hand, the least priority p 
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Algorithm 2. A nondeterministic algorithm for deciding the value of a state 
in a mean-payoff parity game. 

Input: mean-pay off parity game G, state qo £ Q, x £ Q 

guess 2-trap T in Q with qo £ T 

Clieck(r) 

accept 

procedure Check(S) 
if S / then 

p ■- min{x(g) \qe S} 
if p is even then 

guess memoryless strategy ctm for Player 1 in G |" S 

if val'*^'^'"' ((Tm, q) < X for some q a S then reject 

Check(5'\ Attrf 
else 

guess 2-trap T / in 5 f (S \ Attrf '-^(x^^W)) 
Check(T); Check(5' \ Attrf '•^(T)) 
end if 
end if 
end procedure 



in S is odd, then val ' (q) > x for all g £ 5 is witnessed by a 2-trap T inside 
S \ Attrj '^'^(x~^(p)) such that both the values in the game Q \ T and the values 
in the game G \ S \ Attr^^'^(T) are bounded from below by x; the latter two 
properties can again be checked by calling Check recursively. The correctness of 
the algorithm relies on the following two lemmas. 

Lemma 7. Let Q be a mean-payoff parity game with least priority p even, T = 
Q\ Attri(x"i(p)), and xeR. // val('^'°'(g) > x for all q e Q and val^^^(g) > x 
for all q T , then val^(q) > x for all q G Q . 

Proof. Assume that val^'^'^\q) > x for aX\ q E Q and val^'^^(g) > x for all 
q £ T, and let q* £ Q. By Theorem [3] it suffices to show that for every 
memoryless strategy r of Player 2 there exists a strategy a of Player 1 such that 
payoff(p((T, r, q*)) > x. Hence, assume that r is a memoryless strategy of Player 2 
in Q. Moreover, let ctm be a memoryless strategy for Player 1 in (G, 0) with 
val^'^'°''((TM, q) > X for all q £ Q, let (Tt be a strategy for Player 1 in Q \ T with 
val^^^{aT, q) > X for all q £ T, and let cta be a memoryless attractor strategy of 
Player 1 on Attri(x~^(p)) that ensures to reach x~^{p)- We combine these three 
strategies to a new strategy cr, which is played in rounds. In the fcth round, the 
strategy behaves as follows: 

1. while the play stays inside T, play ax', 

2. as soon as the play reaches Attri(x~^(p)), switch to strategy cta and play cta 
until the play reaches 

3. when the play reaches X~^ip)j pl^^Y '^m for exactly k steps and proceed to 
the next round. 
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Let p := p(a,T,q*). To complete the proof, we need to show that payoff (p) > x. 
We distinguish whether p visits Attri(x^^(p)) infinitely often or not. 

In the first case, we divide p into p = 7o7i72 • • • where each 7^ = ijltl^ 
sists of a part consistent with ut (thus staying inside T) , a part consistent with cta 
(thus staying in Attri(x^^(p))), and one that starts with a state in x~^{p) ^^'^ is 
consistent with an- Since r is a memoryless strategy, there can only be |T| many 
different 7^, and the length of each 7^ is bounded by some constant k. Since 
each 7^ is consistent with an attractor strategy, the length of each 7^^ is bounded 
by IQI- Hence, the length of 7]^ grows continuously while the length of ^J^f' 
is bounded. Therefore, liminf„_j.oo payoff„(p) = liminf„_^oo payoff„ (71^72^ ■ ' ' )■ 
Since val('^'''^(crM, g) > X for all g € Q and priority p is visited infinitely often, 
we have payoff (p) = liminf„^oo payoff „(p) > x. 

In the second case, p = ^ ■ p' , where p' is a play oi Q \ T that is consistent 
with ar- Hence, payoff(p) = payoff(p') > val^^"^(crT, p'(0)) >x. □ 

Lemma 8. Let Q be a mean-payoff parity game with least priority p odd, T = 
Q \ Attr2(x~^(p))7 o,nd a; e M. //val^(q) > x for some q £ Q, then T 7^ and 
val^'^"^(q) > X for some q^T. 

Proof. Let q* £ Q be a state with val^(g*) > 0. If T = 0, then Attr2(x"np)) = Q 
and there is a memoryless attractor strategy r for Player 2 m Q that ensures to 
visit x~^(p) infinitely often. This implies val^(r, q*) = —00, a contradiction to 
val^(g*) > X. Thus T 7^ 0. 

Now assume that val^'^"'"(g) < x for all qET, and let r be a (w.l.o.g. memo- 
ryless) strategy for Player 2 in Q \ T that ensures val^'^"^(r, q) < x for all q G T. 
We extend r to a strategy t' in Q by combining it with a memoryless attractor 
strategy for x^^ip) on the states in Q \ T. Let p G Out^(r', q*). Either p reaches 
X~^{p) infinitely often, in which case payoff^ (p) = —00, or there is a position i 
from which onwards p stays in T, in which case payoff^ (p) = payoff^ ^'^{p[i, 00)) < 
val^'^"^(T, p(i)). In any case, val^(T',q*) < max^g^ val^'^"^(T, q) < x, a contradic- 
tion to val^(g*) > x. □ 

Finally, Algorithm[2]runs in polynomial time because the value of a memoryless 
strategy in a mean-payoff game can be computed in polynomial time ||24j and 
because recursive calls are limited to disjoint subarenas. 

Theorem 9. The value problem for mean-payoff parity games is in NP. 

Proof. We claim that Algorithm [2] is a nondeterministic polynomial-time algo- 
rithm for the value problem. To analyse the running time, denote by T(n) the 
worst-case running time of the procedure Check on a subarena S of size n. Since 
the value of a memoryless strategy for Player 1 in a mean-payoff game can be 
computed in polynomial time |24j and attractor computations take linear time, 
there exists a polynomial / : N x N — >■ N such that the numbers T(n) satisfy the 
following recurrence: 

T(l)</(|lG||,||x|l), 

T{n)< max r(A:) + r(?i - fc) + /(||G||, . 

l<A;<n 



14 



Solving this recurrence, we get that T{n) < (2n — 1)/(||G||, ||a;||) for all n > 1, 
again a polynomial. Consequently, the algorithm runs in polynomial time. 

To prove the correctness of the algorithm, we need to prove that the algorithm 
is both sound and complete. We start by proving soundness: If the algorithm 
accepts its input, then val^((7o) > x.Itv fact, we prove the following stronger 
statement. We say that Check(5') succeeds if the procedure terminates without 
rejection (for at least one sequence of guesses). 

Claim. Let S* C Q. If is a subarena of Q and Check(S') does succeed, then 
val^f^(9) > X for all qcS. 

Assume that the claim is true and that the algorithm accepts its input. Then 
there exists a 2-trap T with qo € T such that val^^'^{q) > x for all q e T. Since 
T is a 2-trap, it follows that val^((7o) > x. 

To prove the claim, we proceed by induction over the cardinality of S. If 
\S\ — 0, the claim is trivially fulfilled. Hence, assume that |5| > and that the 
claim is true for all sets S' <Z Q with \S'\ < \S\. Let p ^ mm{x{q) \ q e S}. We 
distinguish two cases: 

1. The minimal priority p is even. Since Check(S') succeeds, there exists a 
memoryless strategy ctm of Player 1 in Q \ S such that val^*^^^''^^(crM, q) > x 
for all qe S, i.e. val^^^'^'°\q) > x for all qe S.Let Attrf^^ {x~\p))- 
Since Check(5') succeeds, so does Check(S' \ A). Hence, by the induction 
hypothesis, val^^^'^^'^''(g) > a; for all g e S* \ A. By Lemma m these two facts 
imply that val^'^'^(q) > x for all q G S. 

2. The minimal priority p is odd. Since Check(S') succeeds, there exists a 
2-trap T ^ $ ing \ {S\ Attrf^ (x^^p))) such that both Check(r) and 
Check(S' \ Attr^^'^(T)) succeed. Let A = Attrf ^-^(T)). By the induction 
hypothesis. Player 1 has a strategy ctt in Q \ T such that val^'^"^(CTT, q) > x 
for all q e r and a strategy as in Q \ S\A such that val^^^'^^{as, q) > x for all 
q G S\A. We extend cr-r to a strategy cr^i in such that val^ ''^(cta, <i) > x 
for all g S A by combining ax with a suitable attractor strategy. By playing as 
as long as the play stays in S" \ A and switching to a a as soon as the play 
enters A, Player 1 can ensure that val^'^'^(g) > x for all q E S. 

Finally, we prove that the algorithm is complete: if val^(<7o) > x, then the 
algorithm accepts the input Q, qo, x. Since the set {q G Q \ val^ {q) > 2;} is a trap 
for Player 2, it suffices to prove the following claim. 

Claim. Let S" C Q. If is a subarena of G and val^'^'^(g) > x for all q E S, then 
Check (5) succeeds. 

As the previous claim, we prove this claim by an induction over the cardinality 
of S. Clearly, Check(S') succeeds if IS*] = 0. Hence, assume that IS*] > and that 
the claim is correct for all sets S' C Q with |S"| < \S\. Moreover, assume that 
5 is a subarena of G such that val^'^'^(g) > x for all q E S (otherwise the claim is 
trivially fulfilled). Again, we distinguish whether p := niin{x{q) \ q E S} is even 
or odd. 
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Algorithm SolveMPP(e) 

Input: mean-payoff parity game Q = (G, x) 
Output: val^ 

if Q = then return 

p ■- mm{x{q) \ q € Q} 
if p is even then 
g := SolveMP(G,0) 

if xiq) = P for all g G Q then return g 

T:=Q\ Attrf (x-'(p)); / := SolveMPP(e; \ T) 
X := min(/(r) U ff(Q)); A := Mt4{r\x) VJ g-\x)) 
return (Q -)> K U {-00} : q^x)U SolveMPP(c; \Q\A) 
else 

r:=Q\Attrf(x-^(p)) 

if T = then return (Q ^ E U {-00} : q ^ -00) 
/ := SolveMPP(5 \ T)- x := max/(r); A := Attrf (/^^(a;)) 
return (Q -> R U {-00} : g 1-^ a;) n SolveMPP(g t Q \ ^) 
end if 



1. The minimal priority p is even. Since val^^'^(g) > x for all q ^ S, also 
val*-'^'^'^'*'-'(g) > a; for all q G S, which is witnessed by a memoryless strat- 
egy ctm- Let A = Attrf '■^(x^Hp))- Since S* \ ^ is a 1-trap and val^^'^(g) > x 
for all q £ S, we must also have val^ ^ ^^'^'^\q) > x for all q €E 5* \ A. Hence, 
by the induction hypothesis, Check(S' \ A) succeeds. Therefore, in order to 
succeed. Check (S") only needs to guess a suitable memoryless strategy <tm- 

2. The minimal priority p is odd. Let A := Attr2 '^■^(x^^(p)). We claim that 
Check(5;) succeeds if it guesses T := {q € S \ A \ val^f(^\^'(g) > x}. By 
Lemmapl the set T is nonempty. Note that T is a 2-trap and that val^'^"^(q) > 
X for all q £T. Hence, by the induction hypothesis, Check(T) succeeds. It 
remains to be shown that Check(5 \ Attr^'^'^(T)) succeeds as well. Note that 
S \ Attrf '-^(T) is a 1-trap, which together with val^''^(q) > x for all g e 5 
implies that val^^^^^^"'? '''^'^^^(q) > a: for all g e \ Attrf ^(T). Hence, the 
induction hypothesis yields that Check(S' \ Attr^'^'^(r)) succeeds. □ 

3.4 A deterministic algorithm 

In this section, we present a deterministic algorithm for computing the values 
of a mean-payoff parity game, which runs faster than all known algorithms for 
solving these games. Algorithm SolvcMPP is based on the classical algorithm 
for solving parity games, due to Zielonka [22.. The algorithm employs as a 
subprocedure an algorithm SolveMP for solving mean- payoff games. By [21] , such 
an algorithm can be implemented to run in time 0(n"^ • m ■ W) for a game with 
n states and m edges. We denote by / U g and / n g the pointwise maximum, 
respectively minimum, of two (partial) functions /, g: Q ^ M U {±00} (where 
(/ U g){q) = (/ n g){q) = f{q) if g{q) is undefined). 
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The algorithm works as follows: If the least priority p in ^ is even, the 
algorithm first identifies the least value of G by computing the values of the mean- 
payoff game (G, 0) and (recursively) the values of the game Q \ Q \ Attri(x~^(p)), 
and taking their minimum x. All states from where Player 2 can enforce a visit 
to a state with value x in one of these two games must have value x in C/. In the 
remaining subarena, the values can be computed by calling SolveMPP recursively. 
If the least priority is odd, we can similarly compute the greatest value of G and 
proceed by recursion. 

Theorem 10. The values of a mean- pay off parity game with d priorities can be 
computed in time 0{\Q\'^+'^ ■ \E\ ■ W). 

Proof. We claim that SolveMPP computes, given a mean-payoff parity game G, 
the function val^ in the given time bound. Denote by T{n, to, d) the worst-case 
running time of the algorithm on a game with n states, to edges and d priorities. 
Note that, if G has only one priority, then there are no recursive calls to SolveMPP. 
Since attractors can be computed in time 0(n + m) and the running time of 
SolveMP is 0(n^ • to • W), there exists a constant c such that the numbers 
T(n,m,d) satisfy the following recurrence: 

T(l, to, d) < c, 

r(n, m, 1) < c ■ ■ m ■ W, 

T{n, TO,, d) < T{n — 1, to, c? — 1) + T{n — \ , d) + c ■ ■ m ■ W . 

We claim that r(n, to, d) < c • (n + lY+'^ ■ m ■ W e 0{n'^+'^ ■ to • W). The claim 
is clearly true if n = 1. Hence, assume that n > 2 and that the claim is true 
for all lower values of ?i. If d = 1, the claim follows from the second inequality. 
Otherwise, 

T{n, TO, d) < T{n — 1, to, d — 1) + T{n — \ , d) + c ■ rC' ■ m ■ W 

< c ■ n''+^ - m-W + c- n''-^^ ■m-W + c-n^-m-W 

< c ■ + n ■ n^'+i + -m-W 
<c-{{n+ +n-{n+ 1^+^) -m-W 
= c-{n+lf+^ -m-W 

It remains to be proved that the algorithm is correct, i.e. that SolveMPP(tJ) = 
val^. We prove the claim by induction over the number of states. If there are 
no states, the claim is trivial. Hence, assume that Q ^ and that the claim is 
true for aU games with less than \Q\ states. Let p := min{x(g') \ q & Q}- We only 
consider the case that p is even, li p is odd, the proof is similar, but relies on 
Lemma |8] instead of Lemma |3 

Let T, /, g, x and A be defined as in the corresponding case of the algorithm, 
and let /* = SolveMPP(^;). If x{Q) = {p}, then f* ^ g ^ val^^'^^ = val^, 
and the claim is fulfilled. Otherwise, by the definition of x and applying the 
induction hypothesis to the game ^ f T, we have 

val(°'°H9) > 

X for all q e Q and 
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val^^'^(q) — f{q) > x for all q <eT. Hence, Lemma [t] yields that val^((7) > x for 
all g e Q. On the other hand, from any state q ^ A Player 2 can play an attractor 
strategy to f~^{x) U g~'^{x), followed by an optimal strategy in the game Q \ T, 
respectively in the mean-payoff game (G, 0), which ensures that Player I's payoff 
does not exceed x. Hence, val^ (q) ~ x = f*{q) for all q ^ A. 

Now, let q G Q \ A. We already know that val^(g) > x. Moreover, since Q\A 
is a 2-trap and applying the induction hypothesis to the game G \ Q \ A, we 
have val^(g) > yaf^'^\'^{q) = SolveMPP(g \Q\A){q). Hence, val^(g) > 
To see that val^(q) < f*{q), consider the strategy t of Player 2 that mimics an 
optimal strategy in Q \ Q \A as long as the play stays in Q \ A and switches to 
an optimal strategy in Q as soon as the play reaches A. We have val^(r, gr) < 
niax{yaf^^\^{q),x}^f*{q). □ 

Algorithm SolveMPP is faster and conceptually simpler than the original 
algorithm proposed for solving mean-payoff parity games [S] • Compared to the 
recent algorithm proposed by Chatterjee and Doyen [B], which uses a reduction 
to energy parity games and runs in time 0(|(5|'^^'* • \E\ ■ d - W), our algorithm has 
three main advantages: 1. it is faster; 2. it operates directly on mean-payoff parity 
games, and 3. it is more flexible since it computes the values exactly instead of 
just comparing them to an integer threshold. 

4 Mean-penalty parity games 

In this second part of the paper, we define multi-strategies and mean-penalty 
parity games. We reduce these games to mean-payoff parity games, show that 
their value problem is in NP n coNP, and propose a deterministic algorithm for 
computing the values, which runs in pseudo-polynomial time if the number of 
priorities is bounded. 

4.1 Definitions 

Syntactically, a mean-penalty parity game is a mean-payoff parity game with 
non-negative weights, i.e. a tuple Q = {G,x), where G — {Qi, Q 2, E , weight) is a 
weighted game graph with weight : E — > M-'^ (or weight : E N ior algorithmic 
purposes), and x- Q ~^ N is a priority function assigning a priority to every 
state. As for mean-payoff parity games, a play p is parity- winning if the minimal 
priority occurring infinitely often (min{x(q) | q € Ini(p)}) is even. 

Since we are interested in controller synthesis, we define multi-strategies only 
for Player 1 (who represents the system). Formally, a multi-strategy (for Player 1) 
in 5 is a function cr: Q*Qi — > 7^(Q)\{0} such that a{jq) C qE for all 7 € Q* and 
q G Qi- A play p of is consistent with a multi-strategy a if p{k-\- 1) € cr(p[0, k]) 
for all fc e N with p(fc) g Qi, and we denote by Out^ {a, qo) the set of all plays p 
of G that are consistent with a and start in p(0) = go- 
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Fig. 2. A mean-penalty Fig. 3. The corresponding mean-payofF parity 
parity game game 

Note that, unhke for deterministic strategies, there is, in general, no unique 
play consistent with a multi-strategy a for Player 1 and a (deterministic) strat- 
egy r for Player 2 from a given initial state. Finally, note that every deterministic 
strategy can be viewed as a multi-strategy. 

Let 5 be a mean-penalty parity game, and let be a multi-strategy. We in- 
ductively define penalty^ (7) (the total "penalty of 7 wrt. (t) for all 7 € Q* by 
setting penalty^ (e) = as well as penalty^ (7(7) — penalty^ (7) if g S (52 and 

penalty^ (79) = penalty^ (7) -I- ^ weight(g, q') 

g'eg£;\cr(79) 

if g G Q\. Hence, penalty^ (7) is the total weight of transitions blocked by a 
along 7. The mean "penalty of an infinite play p is then defined as the average 
penalty that is incurred along this play in the limit, i.e. 

I limsup - penalty^ (/9[0, n)) if p is parity- winning, 
penalty^ (p) = < n^oo " 

I cxD otherwise. 

The mean penalty of a strategy a from a given initial state go is defined as the 
supremum over the mean penalties of all plays that are consistent with cr, i.e. 

penalty^ (cr,9o) = sup{penalty^ (p) | p e Out^(o-, ^o)}- 

The value of a state go in a mean-penalty parity game Q is the least mean penalty 
that a multi-strategy of Player 1 can achieve, i.e. val^(go) = info- penalty^ (cr, go), 
where a ranges over all multi-strategies of Player 1 . A multi-strategy a is called 
optimal if penalty^ (cr, go) — val^(go) for all go € Q. 

Finally, the value problem for mean-penalty parity games is the following 
decision problem: Given a mean-penalty parity game G = (G, x), an initial state 
go G Q, and a number a: e Q, decide whether val^(go) < x. 

Example 11. Fig. |2] represents a mean-penalty parity game. Note that weights of 
transitions out of Player 2 states are not indicated as they are irrelevant for the 
mean penalty. In this game. Player 1 (controlling circle states) has to regularly 
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block the self-loop if she wants to enforce infinitely many visits to the state with 
priority 0. This comes with a penalty of 2. However, the multi-strategy in which 
she blocks no transition can be played safely for an arbitrary number of times. 
Hence Player 1 can win with mean-penalty (but infinite memory), by blocking 
the self-loop once every k moves, where k grows with the number of visits to g2- 

4.2 Strategy complexity 

In order to solve mean-penalty games, we reduce them to mean-payoff parity 
games. We construct from a given mean-penalty parity game Q an exponential- 
size mean-payoff parity game G', similar to [3] but with an added priority 
function. Formally, for a mean-penalty parity game G — (G, x) with game 
graph G — (Qi, Q2, E, weight), the game graph G' = {Q'i,Q'2, E' , weight') of the 
corresponding mean-payoff parity game Q' is defined as follows: 

— g; = Oi and 0'2 = Q2 U Q, where Q := {{q,F) \qeQ, ^ F C qE}; 

— E' is the (disjoint) union of three kinds of transitions: 

(1) transitions of the form (g, (g, F)) for each q € Qi and 7^ F C qE, 

(2) transitions of the form ((/, {q, {g'})) for each q E Q2 and q' E qE, 

(3) transitions of the form {{q, F), q') for each q' E F; 

— the weight function weight' assigns to transitions of type (2) and (3), but 
weight'(g, {q,F)) = -2 X^^'e^^XF weight((7, q') to transitions of type (1). 

Finally, the priority function x' of G' coincides with x on Q and assigns priority 
M := max{x(g) \ q E Q} to all states in Q. 

Example 12. Fig. [3] depicts the mean-payoff parity game obtained from the 
mean-penalty parity game from Example depicted in Fig. [2j 

The correspondence between Q and Q' is expressed in the following lemma. 

Lemma 13. Let Q be a mean-penalty parity game, Q' the corresponding mean- 
payoff parity game, and qo E Q. 

1. For every multi-strategy a in Q there exists a strategy a' for Player 1 in Q' 
such that val(cr',qo) ^ ~ penalty (cr, go). 

2. For every strategy a' for Player 1 in Q' there exists a multi- strategy a in Q 
such that penalty (ct, go) l£ ^ val((T', go)- 

3. val^'(go) --val5(go). 

Proof. Clearly, 3. is implied by 1. and 2., and we only need to prove the first 
two statements. To prove 1., let cr be a multi-strategy in Q. For a play prefix 
7 = 90(90, Fq)- - ■ g„(gn, En) in Q' , let 7 := go • • • g„ be the corresponding play 
prefix in Q. We set cr' (7g) — (g, F) if g e Qi and cr(7g) = F. Clearly, for each 
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p' G Out((T', go) there exists a play p G Out(f7, go) with — penalty^(p) — payofF(/3') 
(namely p(i) — p'{2i) for all i eN). Hence, 



val^'(a',go) = inf{payoff(p') | p' e Out(a',go)} 

> inf{- penalty^ (p) | p G Out (cr, go)} 
-sup{penalty^(p) | p G Out((T,qo)} 
= -penalty(cr, go) ■ 

To prove 2., let u' be a strategy for Player 1 in tj'. For a play prefix "/ = qo ■ ■ ■ Qn 
in C/, we inductively define the corresponding play prefix 7 in Q' by setting q — q 
and 79 = 7 • cr'(7) • g. We set (7(7) = F if cr'(7) = (9, i^). For each p G Out(cr, go) 
there exists a play p' G Out((T',go) with penalty ^.(p) = — payofF(/9'), namely the 
play p' defined by p'{2i) = /9(i) and 

,.2- , 1) = [(P(*),^(p[0,i])) if p(«) e Qi, 
^ ^ \(p(z),{p(* + l)}) if p{^)eQ2, 

for all z G N. Hence, 

penalty((T,qo) = sup{penalty^(p) | p G Out(cr,9o)} 
< sup{-payoflt(p') I p' G Out(cT',?o)} 
= -inf{payofl[(p') | p' G Out(a',go)} 

= -val^'((7',go). □ 

It follows from Theorem [3] and Lemma |13| that every mean-penalty parity 
game admits an optimal multi-strategy. 

Corollary 14. In every mean-penalty parity game, Player 1 has an optimal 
multi- strategy. 

We now show that Player 2 has a memoryless optimal strategy of a special 
kind in the mean-payoff parity game derived from a mean-penalty parity game. 
This puts the value problem for mean-penalty parity games into coNP, and is 
also a crucial point in the proof of Lemma [T7| below. 

Lemma 15. Let Q be a mean-penalty parity game and Q' the corresponding 
mean-payoff parity game. Then in Q' there is a memoryless optimal strategy r' 
for Player 2 such that for every q £ Q there exists a total order <g on the set qE 
with T'((q,F)) — min<^ F for every state {q,F) G Q. 

Proof. Let r be a memoryless optimal strategy for Player 2 in Q' . For a state q, 
we consider the set qE and order it in the following way. We inductively define 
= qE, q, = r((g,F,)) and = F, \ {q,} for every 1 <i < \qE\. Note 

that {qi, . . . , q\qE\} — qE. We set qi <g 92 <g • ■ ■ <<j q\qE\ £^nd define a new 
memoryless strategy r' for Player 2 in Q' by r'((g, F)) — min<^ F for (g, F) £ Q 
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and r'(g) = r(g) for all <? G Q2- To prove the lemma, we have to show that r' is 
at least as good as r and thus optimal. 

Let qo & Q and p' € Out(T',go)- Wc construct a play p e Out(T, go) with 
payofF(p) > payofT(p') in the following way. For every position i with p'{i) = 
{q,F'), let G qE \ T'{{q,F')) <, q'} (then r((9,F)) - r'((q,^')) by the 

definition of r') and set = {q,F). For every other position i, let /9(i) = 
Note that p € Out(r, (70) and minx(Inf(/9)) = min x(Inf (p'))- Moreover, we 
have F' C F and therefore weight' {q, {q, F')) < weight' {q, {q, F)) whenever 
p'{i) — {q, F') and p{i) = {q, F) (because weights in Q are nonnegative) . Hence, 
payoff (p) > payoff (p'). Since p' was chosen arbitrarily, it follows that 

val(T, go) = sup{payoff(p) | p e Out(T,go)} 
> sup{payoff(p') I p' e Out(T',go)} 
= val(r',go). 

Hence, r' is optimal. □ 



4.3 Computational complexity 

In order to put the value problem for mean-penalty parity games into NP fi coNP, 
wc propose a more sophisticated reduction from mean-penalty parity games to 
mean-payoff parity games, which results in a polynomial-size mean-payoff parity 
game. Intuitively, in a state q G Qiwe ask Player 1 consecutively for each outgoing 
transition whether he wants to block that transition. If he allows a transition, 
then Player 2 has to decide whether she wishes to explore this transition. Finally, 
after all transitions have been processed in this way, the play proceeds along the 
last transition that Player 2 has desired to explore. 

Formally, let us fix a mean-penalty parity game G = {G, x) with game graph 
G = {Qi,Q2, E, weight), and denote by k := max{|(j'£'| | q G Q} the maximal 
out-degree of a state. Then the polynomial-size mean-payoff parity game Q" has 
vertices of the form q and {q,a,i,m), where q € Q, a €z {choose, allow, block}, 
ie {!,... ,fc + l} and m e {0, . . . , fc}; vertices of the form q and {q, choose, i, m) 
belong to Player 1, while vertices of the form (g, allow, i, m) or (g, block, z, m) 
belong to Player 2. To describe the transition structure of G, let q G Q and 
assume that qE = {gi, . . . , g^} (a state may occur more than once in this list). 
Then the following transitions originate in a state of the form g or (g, a, i, m): 

1. a transition from g to (g, choose, 1,0) with weight 0, 

2. for all 1 < i < A: and < m < A: a transition from (g, choose, i, m) to 
(g, allow, i, m) with weight 0, 

3. if g € Qi then for all 1 < i < fc and < m < k a transition from 
(g, choose, i, m) to (g, block, i, m) with weight 0, except ii i = k and to = 0; 

4. for all < m < fc a transition from (g, choose, k + l,m) to g^ with weight 
(where go can be chosen arbitrarily), 

5. for all 1 < i < and < to < A: a transition from (g, allow, «, to) to 
(g, choose, with weight 0, 
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Fig. 4. The game Q" associated with the game Q of Fig. 2 



6. for all 1 < i < k and 1 < m < k a. transition from (g, allow, i, m) to 
(g, choose, i + 1, m) with weight 0, 

7. for all 1 < i < fc and < m < fc a transition from (g, block, i, m) to 
(g, choose, i + 1, m) with weight — 2(A: + 1) • weight(g, qi). 

Finally, the priority of a state q € Q equals the priority of the same state in Q, 
whereas all states of the form (q, a, i, m) have priority M = max{x(q) | q G Q}. 

Example 16. For the game of Fig. [2] this transformation would yield the game 
depicted in Fig. [4] In this picture, a, b and c stand for allow, block and choose, 
respectively; zero weights are omitted. 

It is easy to see that the game Q" has polynomial size and can, in fact, be 
constructed in polynomial time from the given mean-penalty parity game Q. 
The following lemma relates the game Q" to the mean-payoff parity game Q' of 
exponential size constructed in Sect. [4!2] and to the original game Q. 

Lemma 17. Let Q be a mean-penalty parity game, Q' the corresponding mean- 
payoff parity game of exponential size, Q" the corresponding mean-payoff parity 
game of polynomial size, and qo G Q. 

1. For every multi strategy a in Q there exists a strategy a' for Player 1 in Q" 
such that val(cr',qo) ^ — penalty (cr, go). 

2. For every strategy t for Player 2 in Q' there exists a strategy r' for Player 2 
in Q" such that val(T',go) < val(r, go). 

3. val^"(go) = -val^(go). 

Proof. To prove 1., let cr be a multi-strategy in Q. For any play prefix 7 in Q" , 
let 7 be the projection to states in Q (i.e. all states of the form {q,a,i,m) are 
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omitted). Assuming that gi, . . . , qfe is the enumeration of qE used in the definition 
of G" , we set cr'(7 • {q, choose, i,m)) = {q, allow, i, m) if (and only if) either q & Qi 
and qi S a{j) or q £ Q2- It is easy to see that for each p' e Out((T',go) there 
exists a play p G Out(cr, go) with — penalty^(p) — payoff(/9'). Hence, 

val(cr',go) = inf {payoff (p') | p' e Out(cr', go)} 

> inf{-penalty^(p) | p € Out(cr,(?o)} 
= -sup{penalty^(p) | p e Out{a,qo)} 
= - penalty (cr, go) ■ 

To prove 2., let r be a strategy for Player 2 in Q' . By Lemma [Ts] there exists 
a memoryless strategy r* for Player 2 in Q' such that val(r*, qo) < val(r, qo) and 
for all g G Q there exists a total order <q on qE with r*((g, F)) = min<^ F for 
all {q,F) G Q. We define a memoryless strategy r' for Player 2 in Q" as follows: 
Assume that qi, . . . ,qk is the enumeration of qE used in the definition of Q" . 
Then we set r'((g, allow, i,m)) = {q, choose, i + if (and only if) one of the 
following three conditions is fulfilled: 1. m = 0, or2. q G Qi and qi <q qm, or 
3. q E Q2 and T*{q) — {q, {qi}) - Now it is easy to see that for each p' G Out(r', go) 
there exists a play p G Out(r*, go) with payoff (p) = payoff (p'). Hence, 

val(r',go) = sup{payofr(p') | p' G Out(r',go)} 

< sup{payofr(p) I p G Out(r*, go)} 
= val(r*,go) 

< val(T, go) . 

Finally, we prove 3. It follows from 1. that val^ (go) > — val^(go), and 
it follows from 2. that val^ (go) < val^ (go). But val^ (go) = — val^(go) by 
and therefore val^ (go) = — val^(go). □ 



Lemma 



13 



Since the mean-payoff game Q" can be computed from Q in polynomial time, 
we obtain a polynomial-time many-one reduction from the value problem for 
mean-penalty parity games to the value problem for mean-payoff parity games. 
By Corollary [6] and Theorem |9] the latter problem belongs to NP n coNP. 

Theorem 18. The value problem for mean-penalty parity games belongs to 
NP n coNP. 



4.4 A deterministic algorithm 

Naturally, we can use the polynomial translation from mean-penalty parity games 
to mean-payoff parity games to solve mean-penalty parity games deterministically. 
Note that the mean-payoff parity game Q" derived from a mean-penalty parity 
game has 0(|(5| • fc^) states and 0(|(5| • k^) edges, where k is the maximum 
out-degree of a state in Q; the number of priorities remains constant. Moreover, if 
weights are given in integers and W is the highest absolute weight in Q, then the 
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highest absolute weight in Q" is 0(fc • W). Using Theorem [To] we thus obtain a 
deterministic algorithm for solving mean-penalty parity games that runs in time 
0(|Q|'^+3 • k'^'^+'^ • W^). If fc is a constant, the running time is 0{\Q\'^+^ -W), which 
is acceptable. In the general case however, the best upper bound on k is the 
number of states, and we get an algorithm that runs in time 0(|(3p'*+^°-iy). Even 
if the numbers of priorities is small, this running time would not be acceptable 
in practical applications. 

The goal of this section is to show that we can do better; namely we will 
give an algorithm that runs in time 0(|Q|'''*''^ • \E\ ■ VK), independently of the 
maximum out-degree. The idea is as follows: we use Algorithm SolveMPP on 
the mean-payoff parity game Q' of exponential size, but we show that we can 
run it on Q, i.e., by handling the extra states of Q' symbolically during the 
computation. As a first step, we adapt the pseudo-polynomial algorithm by Zwick 
and Paterson [24 to compute the values of a mean-penalty parity game with a 
trivial parity objective. 

Lemma 19. The values of a mean-penalty parity game with priority function 
X = can be computed in time OdQI** • \E\ ■ W). 

Proof Let g ^ (G,x) with G = (Qi, Q2, weight), and g' = (G",x') with 
G' — {Q[,Q'2, E' , weight'). For a state q £ Q', we let vo{q) — 0, and for fc > 0, 
we define 



Vkiq) 



max weight'((7, q') + Vk-i{q') if q e Q[, 

q'&qE' 

min weight'((7, q') + Vk-i{q') ii q e Q'2- 

,q'GqE' 



If q E Q, then the definition of g' yields that 

{max weight' (g, (q, F)) + min Vk-2{.q') if g G Qi, 
mm Vk~2{q ) u q eQ2, 

q'eqE 

In the first case, a naive computation would require the examination of an 
exponential number of transitions. In order to avoid this blow-up, we use the 



same idea as in the proof of Lemma 15 Let qE = {qi, . . . , g^} be sorted in 



such a way that i < j implies Vk~2{qi) < Vk-2iqj)- Since weight' (g, (g, i^)) < 
weight'(g, [q, F')) if F C F', we have 

Vk{q) = maxweight'(g, {q, {qi, . . .,qr})) + Vk-2{qi)- 

i 

Hence the sequence V2k can be computed in time 0{k ■ \E\) on Q. Now, despite 
the exponential size of C/', the length of a simple cycle in g' is at most 2\Q\. 
Hence, Theorem 2.2 in becomes 

2fc- val^'(q) -4|Q| • W' < V2k{q) < 2fc • val^'(g) -|- 4|Q| • W 

for all q £ Q, where W is the maximal absolute weight in g' . Since W < 
\Q\ ■ 2W, it follows from p3] that val^ = - val^ \ Q can be computed in time 
Oi\Q\^-\E\-W). □ 
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Now, given a mean-penalty parity game G with associated mean-payoff parity 
game Q' and a set T of states of G, we define 



A^{T)=TU{{q,F)eQ\FCT}; 
k^{T)^TU{{q,F) eQlFDTj^d)}. 

We usually omit to mention the superscript G when it is clear from the context. 

Lemma 20. If S is a subarena ofG, then A{S) and A{S) are suharenas of G' ■ 

Proof. Assume that S" is a subarena of G, and pick a state q in A (5). If q £ Q, 
then it also belongs to S and, as a state of G, has a successor q' in S. Then A (5) 
contains (g, {q'}), which is a successor of q. If q belongs to Q, then qE' C 5 by 
definition of A(S'); hence it has at least one successor in S. A similar argument 
shows that A (5) is also a subarena oi G'- □ 

Lemma 21. Let G be a mean-penalty parity game with associated mean-payoff 
parity game G' , and let A, B ^ Q . Then 

A(v4 n B) = A{A) n A(B), A(A U D A(A) U A(B), 

A{A UB) = A{A) U A(B), A(A n B) C a(A) n 

A{Q\A) = Q'\A{A), A(g\A) = Q'\A(A). 

Proof. Straightforward. □ 

Lemma 22. Let G be a mean-penalty parity game with associated mean-payoff 
parity game G' , and let F Q Q . Then 

A(Attrf (i^)) ^ Attrf (i^) ^ Attrf (A(i^)), 
A(Attr^(F)) = Attrf (i^) - Attrf . 

Proof. We only prove the first statement; the second can be proved using similar 
arguments. Clearly, Attr^ (F) = Attr^ (a(F)), so we only need to prove that 
A(Attr^(F)) = Attrf (i^). First pick q G A(Attr^(F)). U q e Q, then the 
attractor strategy for reaching F can be mimicked in G' , and therefore q € 
Attr^ (F). On the other hand, if g € Q, then all successors of q lie in Attrf (F) 
and therefore also in Attrf' {F). Hence, q G Attrf (F). Now pick q G Attrf(F). 
If q G Q, then the attractor strategy for reaching F yields a mtilti-strategy cr in G 
such that all plays p G Out^{a,q) visit F. Hence, q G Attrf (F) C A(Attrf (F)). 
On the other hand, if g G Q, then all successors of q lie in Q n Attrj' (F) (since 
g is a Player 2 state) and therefore also in Attr^(F). Hence, g G A(Attr5^(F)). □ 

Algorithm SymbSolveMPP is our algorithm for computing the values of a 
mean-penalty parity game. The algorithm employs as a subroutine an algorithm 
SymbSolveMP for computing the values of a mean-penalty parity with a trivial 



priority function (see Lemma 19 1. Since SymbSolveMP can be implemented to 
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Algorithm SymbSolveMPP(C/) 

Input: mean-penalty parity game Q = (G, x) 
Output: val^ 

if Q = then return 

p ■- min{x(g) \q€Q} 
if p is even then 

g := SymbSolveMP(G,0) 

if xil) = P for all g G Q then return g 

T--Q\ Attrf (x"'(p)); / := SymbSolveMPP(g \ T) 
X := max(/(r) U g{Q)); A := AttrS(/-i(x) U g-\x)) 
return (Q ^ K U {00} : q^x)n SymbSolveMPP(c; \Q\A) 
else 

r:=Q\Attrf(x-^(p)) 

if T = then return (Q ^ K U {00} : ix) 
f := SymbSolveMPP(e \ T); x := min/(r); A := Attrf (/-^(a:)) 
return (Q -> R U {00} : g x) U SymbSolveMPP(e t Q \ ^) 
end if 



run in time 0(|Q|^ • \E\ ■ W), the running time of the procedure SymbSolveMPP 
is 0(1(31''+'^ ' \E\ ■ W). Notably, the algorithm runs in polynomial time if the 
number of priorities is bounded and we are only interested in the average number 
of edges blocked by a strategy in each step (i.e. if all weights are equal to 1). 

Theorem 23. The values of a mean-penalty parity game with d priorities can 
be computed in time OdQI'^^^ • \E\ ■ W). 

Proof. Fr om Lemma |19| and with the same runtime analysis as in the proof of 
Theorem [lOl we get that SymbSolveMPP runs in time 0{\Q\'^+^ ■ \E\ ■ W). We 



now prove that the algorithm is correct, by proving that there is a correspondence 
between the values the algorithm computes on a mean-penalty parity game G and 
the values computed by Algorithm SolveMPP on the mean-payoff parity game G' . 
More precisely, we show that SolveMPP(^') \ Q = - SymbSolveMPP(^). The 
correctness of the algorithm thus follows from Lemma [13] which states that 
val^' \ Q^-vaf. 

The proof is by induction on the number of states in Q. The result holds 
trivially if Q = 0. Otherwise, assume that the result is true for all games with 
less than |Q| states and let p = nim{x{q) \ q & Q}- By construction, p is also the 
minimal priority in G' . We only consider the case that p is even; the other case is 
proved using the same arguments. 

Write g' , T' , /', x' and A' for the items computed by SymbSolveMPP on Q', 
while q, T, /, x and A are the corresponding items computed by SolveMPP 
on Q. Then g'{q) = —g{q) for all q & Q, and g'{{q,F)) = miUqi^F g'{q') for all 
(g, F) ^ Q (since such states belongs to Player 2). HQ has only one priority, the 



result follows. Otherwise, by Lemmas 21 and |22} we have T' = A(T). However, 



any state (g, F) £ T' that is not a state of the game {Q \ T)' has no predecessor 
in g' \T': if qeT' then qeTOQi and qE\T^%, i.e. n Attri(x"Hp)) 7^ 0; 
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but then q G Attri(x ^{p)) and thus q ^ T, a. contradiction. It follows that 
SolveMPP(5' \r)\T = SolveMPP((g; t T)') \ T. 

Now, since T is a strict subset of Q, the induction hypothesis applies, so 
that fit) = -f{t) for all t eT.lt follows that x' = ~x. Let 5 := Q \ ^ and 
5" := Q' \ A'. By Lemma [22l A' = k{A), and by Lemma |2lJ S' = A(S'). Again, 
any state (g, F) e S' that is not a state of the game {Q \ S)' has no predecessor 
in g' \ S'. Hence, SolveMPP(g' ^ S") r -S" = SolveMPP((g \ S)') \ S Applying the 
induction hypothesis to the game G f 5, we get that SolveMPP((CJ \ S)') \ S = 
- SymbSolveMPP(G' \ S), and the result follows for □ 



5 Conclusion 

In this paper, we have studied mean-payoff parity games, with an application 
to finding permissive strategies in parity games with penalties. In particular, 
we have established that mean-penalty parity games are not harder to solve 
than mean-payoff parity games: for both kinds of games, the value problem 
is in NP n coNP and can be solved by an exponential algorithm that becomes 
pseudo-polynomial when the number of priorities is bounded. 

One complication with both kinds of games is that optimal strategies for 
Player 1 require infinite memory, which makes it hard to synthesise these strategies. 
A suitable alternative to optimal strategies are e-optimal strategies that achieve 
the value of the game by at most e. Since finite-memory e-optimal strategies are 
guaranteed to exist [2] , a challenge for future work is to modify our algorithms 
so that they compute not only the values of the game but also a finite-memory 
£-optimal (multi-)strategy for Player 1. 
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